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SENSING TENSORS WITH GAUSSIAN FILTERS 


STEPHANE CHRETIEN AND TIANWEN WEI 


Abstract. Sparse recovery from linear Gaussian measurements has been the subject of 
much investigation since the breaktrough papers [6] and |11| on Compressed Sensing. Ap¬ 
plication to sparse vectors and sparse matrices via least squares penalized with sparsity 
promoting norms is now well understood using tools such as Gaussian mean width, statisti¬ 
cal dimension and the notion of descent cones [22 ] [231 . Extention of these ideas to low rank 
tensor recovery is starting to enjoy considerable interest due to its many potential applica¬ 
tions to Independent Component Analysis, Hidden Markov Models and Gaussian Mixture 
Models [T], hyperspectral image analysis [25], to name a few. In this paper, we demonstrate 
that the recent approach of [23 ] provides very useful error bounds in the tensor setting using 
the nuclear norm or the Romera-Paredes-Pontil [20] penalization. 


1. Introduction 

Real tensors, i.e. multidimensional arrays of real numbers, have been recently a subject of 
great interest in the applied mathematics community. We refer to [15] and m for modern 
references on this subject. It has become quite clear nowadays that real symmetric tensors such 
as cumulants up to fourth order play a very important role in many applications in statistics, 
machine learning and signal processing; see for instance m for a general survey. Research on 
applications of tensors has been increasing in the recent years with very important conceptual 
contributions such as proposed in 0, H, E], g] and the very nice survey of applications in p. 
In particular, certain Gaussian Mixture Models (GMM) can be estimated using this approach. 
The same is also true for Independent Component Analysis (ICA) and Hidden Markov Models 
(HMM). Nonsynnnetric tensors also occur frequently in applications as 3D images such as in 
medical imaging and hyperspectral image processing [IT], [2T |. 

In some applications, the tensor is observed through the operation of random filtering, e.g. 
taking the scalar product with an i.i.d. Gaussian random vector. Gaussian random sensing 
has been thoroughly investigated in recent years and can be recast as sparse recovery for one 
dimensional tensors (i.e. vectors), low rank recovery for bidimensional tensors (i.e. matrices). 
See e.g. [23] for a tutorial on this topic. 

The goal of this short note is to show that the results of [23] can easily be extended to tensors. 
First, we consider nuclear norm minimization. Next, we consider recovery via minimization 
of the Romero-Paredes-Pontil functional. 

2. Main facts about tensors 

Let D and ni,...,no be positive integers. Let X € M niX "' xn n denote a D-dimensional 
array of real numbers. We will also denote such arrays as tensors. 

2.1. Basic notations and operations. A subtensor of A is a tensor obtained by fixing some 
of its coordinates. As an example, fixing one coordinate id = k in X for some k G {1, ..., 

l 
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yields a tensor in ]£ n ix---xra d _ixn d+1 x-xnD. j n the sequel, we will denote this subtensor of X 
by X id=k . 

The fibers of a tensor are particular subtensors that have only one mode, i.e. obtained by 
fixing every coordinate except one. The mode-d fibers are the vectors 

(‘^'”*1, ■■■Ad— ldddd+lr • -3d ) 

They extend the notion of columns and rows from the matrix to the tensor framework. For a 
matrix, the mode-1 fibers are the columns and the mode-2 fibers are the rows. 

The mode-d matricization X^ of X is obtained by forming the matrix whose rows are the 
mode-d fibers of the tensor, arranged in an cyclic ordering; see m for details. 

The mode-d multiplication of a tensor X £ R"i x, '' xnB by a matrix U € denoted 

by X x d U, gives a tensor in ^ n ix---xn' d x---xn D _ f s defined as 

nd. 

^ d ,**-,2 d _ 1 d^dd^-l ,• ■ — 1 ,■ ■ -dr) * 

Last, we denote by || • \\f the Frobenius norm, i.e.: 

( ni n D 

£ ■ ■ ■ £ x l .. 

h =1 *n=l 

2.2. Higher Order Singular Value Decomposition (HOSVD). The Tucker decompo¬ 
sition of a tensor is a very useful decomposition. It can be chosen so that after appropriate 
orthogonal transformations, one can reveal a tensor S hidden inside X enjoying interesting 
rank and orthogonality properties. In this contribution, we will make use of the HOSVD, a 
generalization of the matrix SVD to the tensor setting based on Tucker decomposition. 

Theorem 2.1 (HOSVD |T0j). Every tensor X £ M ni x-xn D C(m ^ e wr m tn as 

(2.1) V = S(X) xi t/ (1) x 2 U {2) ••• x D U {D \ 

where each U ^ € W ldXnd is an orthogonal matrix and S(X) £ M niX '" x?lD i s a tensor of the 
same size as X with the following properties: 

1. For all possible values of d,a and j3 subject to a /5, subtensors S(X)i d=a and 
S(X) id= g are orthogonal, i.e. 

(S{X) ld=a ,S{X) ld= g) = 0. 

2. For all possible values of d, there holds : 

\\S(X)i d =i\\ F > \\S(X) id=2 \\ F > ... > \\S(X)i d=rid \\ F > 0. 

3. The quantities |]5(/t’)j ii= fc||,p for k = 1,... ,nd are the singular values of the mode-d 
matricization Xt d \ of X and the columns of are the corresponding singular vectors. 

Let (8) denote the standard Kronecker product for matrices. Then it follows from (12.ip that 
(2.2) X {d) = U^S {d) {X) ( 'u ( d+1 ) (8) ■ ■ ■ (8) U^ D) <8> <8> • • • (8 *, 

where S(X)^ d \ is the mode-d matricization of S(X). Taking the (usual) SVD of the matrix 

*M> 

x {d) = [/ (d) £ (d) V (d)t 


1/2 


MX £ f BlX -Xnj) . 
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and based on (12.2|) . we get 

«S (d) (*) = £( d V (d) * (V ((m) (8) ■ ■ ■ (8) U {D) <g> C/ (1) (8) ■ ■ ■ <g> . 

2.3. The spectrum. The mode-d spectrum is defined as the vector of singular values of 
and we w iU denote it by a^ d \X). Notice that this construction implies that <S(*) has 

orthonormal hbers for every modes. With a slight abuse of notation, we will denote by a 
the mapping which to each tensor X assigns the vector 1/y/D (cri 1 !,..., crl D l) of all mode-d 
singular spectra. 

2.4. Tensor norms. We can define several tensor norms on ]^ rt i x '" xn o_ The fi rs t Q ne is a 
natural extension of the Frobenius norm or Hilbert-Schmidt norm from matrices to tensors. 
We start by defining the following scalar product on M. niX '" xriD : 

n i n D 

(X,y) = f •" 2 x ii,-,iDVii,...,iD- 

* 1=1 * D =1 

Using this scalar product, we can also define the Frobenius norm as 

||*||f d = y/{X,X). 

One may also define an "operator norm" in the same manner as for matrices as follows 

||*|| d = max (X,u^ ® • • ■ ® 

u ( d ) g ||u^|| 2 = 1 

d=l,...,D 

We also define the 

Hof 

(2.3) ||*||* = max (X,y). 

yeR n i x '" Xri D 

iiyii^i 

The norm || • ||* can be interpreted as a generalization of the nuclear norm for matrices. They 
can be shown to be equal for certain class of tensors such as Orthogonally Decomposable 
tensors [7j. Another interesting function is the Romera-Paredes-Pontil functional. 

3. Tensor recovery based on random measurements using convex optimization 

3.1. Previous works. Our goal is to estimate an unknown but structured ni x x nr> 
tensor X* from m linear observations given by 

Vi = {Gi,X^), i = 

The unknown tensor of interests X^ although resides in a extremely high dimensional data 
space, it has a low-rank structure in many applications. The general problem of estimating 
a low rank tensor has applications in many different areas, both theoretical and applied. We 
refer the readers to El HU- 

For a tensor of low Tucker rank, the matrix unfolding along each mode has low rank. 
Given observations yi, . ■ ■ ,ym, we would like to attempt to recover X^ by minimizing some 
combination of the ranks of the unfoldings, over all tensors X^ that are consistent with our 
observations. This yields the following optimization problem: 

d 

(3.4) min Y^rank(* W) ) s.t. (&, *) = yi, i = 1, ..., m. 

Aeri x "' xn n ,, 

( 1=1 
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Optimization problem (13.4|) . although intuitive, is non-convex and NP hard. A natural convex 
surrogate of (13.411 can be obtained by replacing the rank with matrix nuclear norms [12]. The 
resulting optimization problem becomes 

d 

(3.5) min V' ||*( rf )||* s.t. (Qi, X) = yi, i = 1, • • ■, m. 

As! 711 x "' Xn D , —\ 

( 1=1 

This optimization problem was first introduced by [Hi El] and has been used successfully in 
a number of applications CZI- 

Iii this work, we are going to consider a related but somewhat different optimization prob¬ 
lem: 

(3.6) min ||*||* s.t. (Qi, X) = yi, i = 1,. . . , m, 

AeK n i x '" xn o 

where the tensor nuclear norm || • ||* is defined in (12.31) . We point out that in general ||*||* ^ 
i ||*(d)||* but the two quantities coincides when * is an orthogonally decomposable 
tensor. The main reason to consider (13.61) is that the method established in [23] for Gaussian 
random filter can be easily generalized to the tensor framework. 


3.2. Recovery by nuclear norm minimization. 


Theorem 3.1. Assume that Qi, are independent m x ■■■ x no random Gaussian 

tensors with independent tV((), 1) entries. Let X to be a solution of the convex program 


Then 

(3.7) 


min Hit’ll* 

AeR ™1 X '" XTI D 


s.t. (Qi,X) =Ui, i = 1,... ,m. 


E 


X - X* 


F m 


lie'll- 


Proof. We consider the following set of n\ x ■ • • x nr> tensors: 


K = {X : 11*11* < ||* # ||*}. 


Applying Theorem 6.2 from |23| , we obtain 


E 


SUp ||* — *|| p 

X£K 

{GiX)=Vi, i=l,—,m 




w(K ) 

y/m ’ 


where w(K ) denotes the Gaussian mean width [23] of set K. By the symmetry of I\, we have 


w(K) = E 


sup (Q,X) 

X&K-K 


2 E sup (Q, X), 
xeK 


where Q is a Gaussian random tensors with A7(0,1) entries. Then using the inequality (Q, X) ^ 
||£/|| ■ ||*||* and Lemma 14.11 we obtain 


w(K ) ^ 2 E 


sup ||<7|| • ||* 
xeK 


^ 2 (y/nf + • • • + y/njo) ||*^||*. 


Then bound (13.71) follows. 


□ 
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3.3. The Romera-Paredes—Pontil relaxation. In the sequel, let us denote by N = Y1a=i n i- 
The Romera-Paredes-Pontil function on M. N , denoted by w**, is the convex envelope of the 
cardinality function || • ||o on the t^-ball of radius a. Its conjugate w* is defined as 

(3.8) u*(g) = sup (g,s) - ||s|| 0 . 

||s|| 2 <a 

By conjugate duality, we have for all g, s in 

(3.9) (g,s) < u* a (g) + ui* a *(s). 

Taking g such that u>^(g) = 1 and w**(s) = 1, we have 

(g,s) < 2. 

Therefore, for any g and s, we have 

(3.10) (g,s)<2u;* a (g)^ a *(s). 

Moreover, by the Von Neumann’s trace inequality for tensors [8], we have 

(3.11) (G,X) < (a(Q),a(X)) 
combining (13.101) and (13.111) . we obtain that 

(3.12) (S,X) <2u* a (<T(g))u?(a(X)). 
based on these results, we obtain the following theorem. 


Theorem 3.2. Assume that Qi, i,...,m are independent m x ■■■ x up random Gaussian 
tensors with independent J\f(0, 1) entries. Let X he a solution of the convex program 


(3.13) 

Then 


min u**(a(X)) s.t. (Qi, X) = m, i = 1,..., m. 

AreR"l x - x "D 


E 


\X-X*\\ F 


€ 


64 ol\Fk 


m 




D + l 


N(1 + N) d - 2 d 

17 


(3.14) 


+ 


2 d+1 - (1 + N) d+1 \\ N(l + N) 
D(D + 1) 


Of*')), 


where n* = max{ni,..., no}, p = C r D \3 rn * and q = rn * + ( r/D) u . 


Proof. We consider the following set of n\ x • • • x nr> tensors: 

K d i' {X :u:"(a(X)) 

Exactly as for the proof of Theorem 13.11 applying Theorem 6.2 from [23] . we obtain 


E 


sup ||T — X\\p 

xe k 

{Gi,X)=yi, i=l,...,n 


^ VTk ■ 


w(K) 

Vm 


and using the obvious symmetry of K , we have 


w(K) = E 

sup (G,X) 

= 2E 

sup (Q,X) 


XeK-K 


X&K 
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where Q is a Gaussian random tensors with jV"(0,1) entries. Then using inequality (13.121) . we 
obtain 


w(K ) ^ 4 E 


<4(*(£0) 




and using Lemma 14.31 we get 


w 


, , r- (( r —— (1 + N) d+1 - 2 d+1 1 / N(l + N) d -2 d 

(A) <32V2 a I ( \/log(p)-- JT+i - + + - f) - 


+ 


2 D+l - (1 + N) d+1 \\ N(l + N) 


D(D + 1) 
Then bound (13.141) follows. 






□ 


4. Some results on Gaussian tensors 

4.1. The spectral norm of a Gaussian tensor. 

Lemma 4.1. Let 

d = ,...,i D =l,...,n D ■ 

be a tensor with i.i.d. standard Gaussian entries. Then 

D 

i =1 

Proof. Consider the following stochastic process indexed by (r/ 1 ),..., u^), where € 
S nd ~ l for d = 1,..., D 

X uW,...,u( D '> = f ® ® u {D) ). 

Let us compute the variance of the increments of this random process. For any u^ d \v^ € 
S nd ~ 1 for d = 1,..., D, we have 

2' 


E 


d-ii(i).u(n) .lit 0 ) 


= E 


n\,...,n D 


D 


D 


E xL, D (U^-n4 

i\,...,io=l d= 1 d= 1 

and since the entries of X have unit variance, we get 

2' 

F 

ni,...,n D D 


E 


X uW .u(°) X vW . 


D 


e n-c’-n 


»i,—)*d=1 d = 1 


< 2=1 


« (1 ) <g) . . . (g) _ ul 1 ) (g) ... (g) V ( D ) 
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Moreover, since 


<S> ■ ■ ■ < 8 > u yiJ) — v KX ' < 8 ) • • • < 8 > v 


(D) - „(1) 




i=l 


we obtain that 


E 




D 


^ Il„(i) _ „(0||2, 


\U K ' - V^’Wf 


i= 1 


Now let us consider another random process indexed by (r/ 1 ),... ,u^): 


Y u m ..„<«> = 


D 


d= 1 

where ~ jV"(0, I nd ), d = 1,..., D are independent Gaussian random vectors. It is easy to 
see that the variance of the increments of Y is 

2 


E 




= E 


D 


E(<? (d) y d) -^ (d) ) 


d=l 


D 

E 

d= 1 




Therefore, the variance of the increments of Y is greater than or equal to the variance of the 
increments of X. Therefore, we can apply Slepian’s lemma m and obtain 


E[||*||] = E 


sup X{u^\...,u^) 


^ E 


= E 


= E 


uWslluW 11 = 1 


sup Y{u^\ ..., ) 

aWiiitiW |i=i 


D 

sup E^ W » u(i) ) 

i*(*):||«W||=l i= l 
i=l 

D 


Ei 

_Z=1 


and by Jensen’s inequality 


D 


EIIAII] « ^ 


i=l 


where the last inequality is derived from Jensen’s inequality. 


□ 






























8 


STEPHANE CHRETIEN AND TIANWEN WEI 


4.2. The entropy of the set of tensors with Frobenius norm equal to a and given 
Tucker rank. Define the set 


= (W£ 


x-xnu 


f < a, ^rank(W^) = r 1 

d =l J 


Then, we have that 

Theorem 4.2. For each s, the set T rja has an s'-net of size N(s) with 


/ o \ rn. /q \ rn*+{r/D) L 

(t) 


where n* = maxjni,..., no] and 

s' = e + a^(l + rs) D — 1 

Proof. For any n and v in N with n > u, let O rl;l/ be the Stiefel manifold defined by 

O n>v = {U £ M nX!/ | U l U = I v } . 

Then, it was proved in [5] Lemma 3.1] that there exists an e-covering number of size (9 /s) nu . 
Let V(r,D) denote the set of integer partitions of r using no more than D integers. Now for 
each v = (y \,..., vp) £ V(r, D), define the set 

£(p) = {S £ | || 5 || f = a | . 

It is well known [16] that the unit sphere of W n admits an e-net of size less than (3/e) m . Using 
this fact, we easily obtain an e—net of £(r) of size no larger than (3a/s) ulX ’” XUD . 

Next, we are going to determine the size of e-net covering the set 

T(p) = {Sx 1 [/« X 2 • ■ ■ X £) uW | 5 G £(,/), uW € D ndtUd ,d = 1 ,...,£>} 

Denote A S = S - S 0 and A = U^ - Ilf 1 for d = 1,..., D. Then 

W = 5 xi C/ (1) x 2 ••• x D U {D) 

= (S 0 + AS)x l (D 0 (1) + A[/ (1) ) x 2 • • ■ x D ( XJ ( 0 D) + A U (D ^) 

= SqXi (C/ 0 (1) + AU^) x 2 -xjj (C/f° + AU^) 

+ AS X! (U 0 (1) + At/^) x 2 • • • x£) (U 0 (D) + AU (d )), 


in which 


So X! (U 0 (1) + AU (1 )) x 2 • • • x D (U 0 (D) + At/( D )) 


(») 


= So x! Lq 1} x 




D 


t 


D 


(0 


< 2=1 


V 


i=l 

i^d 


+ ■■■ 


D 


D 


+ 


v 5 0 nxd,Au^) n x ^c 


(0 


+ ■ 


d-, = 


dp^d q ,Vp^q 


3 = 1 


1 = 1 


+ So x ! AC/q 1} x 2 ---x d A U ( , D) . 
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Since 


AS xi (U^ + A uW) x 2 • • • x D {U ( 0 D) + A U^) 


= IIASI 


k D k 

Sol\x d .AU^ n XiC^ 0 ! ^ ae k W u d . 

7=1 *= 1 7 = 1 


It follows that 


WW-SoX^^ x 2 ---x D U^ ) \\ F 


(D) | 


D 


D 


D 


^ II AS||f + otevn + • • • + yy ]^[ I'dj + • • • + ae 15 JJ| 




d=l 


d! j = l 

dp^dq,Vp#q 


d=l 


< * + £>• + ••• + £ Q ,‘(l) t + ... + ae -(A) D 

d=l d 1 ,...,d fc =l,l^fc<D 

dp^d, q ,Vpjtq 


D 


= £ 


+ a ^ C D £k (l) 


r\ k 


k =1 


< e + a( (1 + re) D - 1 


Now let us rewrite the set T as the following: 


D 


T r , Q = |w g R niX '" xnD I ||VW||f <a, ^ rank(W (,i) ) = r\ 

= {Sxrt/W x 2 ---x D U^ | u € V(r,D),S G £(», e O n , M<r cl - 1,..../;} 


and denote 


e' = e + a^(l + re) D — 1^. 


Summing up our discussion, we conclude that there exists an e-net of T r a with covering 
number 


N(p' r) < ^ 


ni^iH- \-n D u D /3q,\ UX-xi/d 



10 


STEPHANE CHRETIEN AND TIANWEN WEI 


and since the cardinality of V(r, D ) equals C r D \, we obtain that 


N(e',r) < C r D ~_\ 


max 


u£V{r,D) \£ J 


- \-n D u D /g \ ux-xi/£, 




^ C r D \ max 


v&V(r,D) V e / 


g\ nii/iH-1 -n D u D 


( 3 a 
max — 

v£P{r,D) \ £ 


v\ x--xi/£) 


< c 


r— 1 
D—l 


rn * ^3 a_y /DV 


= C 


-i ( 3 \ rn * (2,a\ rn * +{r/D) 


D-l 


a 


* 


where n* = maxjni,..., no}. 


□ 


4.3. The dual Romera-Paredes—Pontil function of Gaussian tensors. In this section, 
we study the expected value of the evaluation at a Gaussian random tensor of the dual to* 
of the Romera-Paredes-Pontil function cj**. We will need some further notations. For any 
vector s € WL N . Using (ESJ). one easily obtains 


(4.15) 


Ua(X) ^ sup (a(X),w) — ll'fi'llo- 
||'H|2<ck 


We also have, by equation (7) in 


u* a {X) < a max \\a\. r (X)\\ 2 - r. 

r=0,...,N 

We have the following result. 

Lemma 4.3. Let 

A )ji=l,...,m,...,ii3=l,...,nD 

be a tensor with i.i.d. standard Gaussian entries. Then 

( _Ci i at\D+ 1 _ oD+i 1 / 

< 8 V 2 a [ y/]og(p)- - FrT -^- \--y/Wq(^N + 


+ 


D + l 

2 d+1 - (1 + N) d+1 \\ N(l + N) 
D(D + 1) 


N(l + N) D -2° 

IT 


Proof. Using (I4.16|) . and the tensor Von Neumann inequality (8, Theorem 1], one obtains that 


(4.16) 

(4.17) 


ou*(X) < max 

r=l,...,A 


sup (X, W) — r 

I|W|If<“ 

Tid=l rank(W( li )) =r 


Since we must enforce the constraint 
(4.18) E 


< a ^ sup (X, W) — r 

,N 

F < 1, Dudley’s entropy bound says that [16 


r=l,...,N 


i UJ a(X)] = 8V2 a V [ yTog(TTTjjde-r. 

_ , „r JO 


r=l,....N 
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Let us compute the integral term. We have 


rn„+(r / D) 1 


< 


[ \/iog WW)j de ' 

Jo 

f v / iog(p)(l + rD(l + re) D ~ l )d£ + f y/q\og{Z/e){l + rD( 1 + re) 
Jo Jo 


^1 + rD{\ + re) D l ^jde 


i o 

ir— 1 


where p = C 1 D _ 1 3 rn * and q = rn * + ( r/D ) D . We have 

rl 


and 


Therefore 


IQ 


i,/log(p)(l + rD( 1 + re) D 1 )de 


>/log(p) + rDy/\og{jp) [ (1 + re) D { de 

Jo 


T+l 1 

z D ~ l -dz 


0og (p) + Vlog (p)((r + l) D - 1) 
0og(p)(r + l) D , 


[ y / q , log(3/e)(l + rD( 1 + re) D l )de 

Jo 


< y/q( 1 + r\D(l + r) D ^ J \/log (3/e)de 


= y/q (1 + rD( 1 + r) 


+oo 

1 x 2 e~ x2 dx 
log 3 


< ^^ifq(l + rD(l + r) D 1 j . 

/ \/iog WW)j de ' 

Jo 

^ \/log(4>)(l + r) D + jy/Kq(l + rD( 1 + r) D_1 ), 


where p = C'^_ 1 (3) rn * and q = rn * + (r/D) D . Plugging this result into (|4.18D 


w< 


(4.19) E [w* (<T)] < 8\/2 a ^ ( v^ogG’Xl + r) D + + rD(l + r) D ~ 

r=l,...,N V 




obtain 
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Approximating sums by integrals, we thus obtain 


E [«£(*)] <8^2 a y/\og(p) 


(1 + N) d+1 - 2 d+1 1 


+ 


2 d+1 - (1 + N) d+1 
D{D + 1) 


D + 1 

N(1 + N) 


N{1 + N) d -2 d 

IT 


+ 4 ^ 1 ^ + 


as announceed. 


□ 
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